Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Proteomics ; 23(23-24): e2200494, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37863817

RESUMEN

Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.


Asunto(s)
Algoritmos , Proteínas de la Membrana , Animales , Caballos , Alineación de Secuencia , Secuencia de Aminoácidos , Análisis de Secuencia de Proteína , Biología Computacional/métodos
2.
J Magn Reson Imaging ; 57(3): 740-749, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-35648374

RESUMEN

BACKGROUND: Timely diagnosis of meniscus injuries is key for preventing knee joint dysfunction and improving patient outcomes because it decreases morbidity and facilitates treatment planning. PURPOSE: To train and evaluate a deep learning model for automated detection of meniscus tears on knee magnetic resonance imaging (MRI). STUDY TYPE: Bicentric retrospective study. SUBJECTS: In total, 584 knee MRI studies, divided among training (n = 234), testing (n = 200), and external validation (n = 150) data sets, were used in this study. The public data set MRNet was used as a second external validation data set to evaluate the performance of the model. SEQUENCE: A 3 T, coronal, and sagittal images from T1-weighted proton density (PD) fast spin-echo (FSE) with fat saturation and T2-weighted FSE with fat saturation sequences. ASSESSMENT: The detection system for meniscus tear was based on the improved YOLOv4 model with Darknet-53 as the backbone. The performance of the model was also compared with that of three radiologists of varying levels of experience. The determination of the presence of a meniscus tear from surgery reports was used as the ground truth for the images. STATISTICAL TESTS: Sensitivity, specificity, prevalence, positive predictive value, negative predictive value, accuracy, and receiver operating characteristic curve were used to evaluate the performance of the detection model. Two-way analysis of variance, Wilcoxon signed-rank test, and Tukey's multiple tests were used to evaluate differences in performance between the model and radiologists. RESULTS: The overall accuracies for detecting meniscus tears using our model on the internal testing, internal validation, and external validation data sets were 95.4%, 95.8%, and 78.8%, respectively. One radiologist had significantly lower performance than our model in detecting meniscal tears (accuracy: 0.9025 ± 0.093 vs. 0.9580 ± 0.025). DATA CONCLUSION: The proposed model had high sensitivity, specificity, and accuracy for detecting meniscus tears on knee MRIs. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Asunto(s)
Menisco , Lesiones de Menisco Tibial , Humanos , Estudios Retrospectivos , Meniscos Tibiales , Lesiones de Menisco Tibial/diagnóstico por imagen , Lesiones de Menisco Tibial/patología , Artroscopía , Articulación de la Rodilla/patología , Imagen por Resonancia Magnética/métodos , Sensibilidad y Especificidad , Redes Neurales de la Computación
3.
J Chem Inf Model ; 62(19): 4820-4826, 2022 10 10.
Artículo en Inglés | MEDLINE | ID: mdl-36166351

RESUMEN

Background: SNARE proteins play a vital role in membrane fusion and cellular physiology and pathological processes. Many potential therapeutics for mental diseases or even cancer based on SNAREs are also developed. Therefore, there is a dire need to predict the SNAREs for further manipulation of these essential proteins, which demands new and efficient approaches. Methods: Some computational frameworks were proposed to tackle the hurdles of biological methods, which take plenty of time and budget to conduct the identification of SNAREs. However, the performances of existing frameworks were insufficiently satisfied, as they failed to retain the SNARE sequence order and capture the mass hidden features from SNAREs. This paper proposed a novel model constructed on the multiscan convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to address these limitations. We employed and trained our model on the benchmark dataset with fivefold cross-validation and two different independent datasets. Results: Overall, the multiscan CNN was cross-validated on the training set and excelled in the SNARE classification reaching 0.963 in AUC and 0.955 in AUPRC. On top of that, with the sensitivity, specificity, accuracy, and MCC of 0.842, 0.968, 0.955, and 0.767, respectively, our proposed framework outperformed previous models in the SNARE recognition task. Conclusions: It is truly believed that our model can contribute to the discrimination of SNARE proteins and general proteins.


Asunto(s)
Redes Neurales de la Computación , Proteínas SNARE , Posición Específica de Matrices de Puntuación
4.
Comput Biol Chem ; 99: 107732, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35863177

RESUMEN

A promoter is a sequence of DNA that initializes the process of transcription and regulates whenever and wherever genes are expressed in the organism. Because of its importance in molecular biology, identifying DNA promoters are challenging to provide useful information related to its functions and related diseases. Several computational models have been developed to early predict promoters from high-throughput sequencing over the past decade. Although some useful predictors have been proposed, there remains short-falls in those models and there is an urgent need to enhance the predictive performance to meet the practice requirements. In this study, we proposed a novel architecture that incorporated transformer natural language processing (NLP) and explainable machine learning to address this problem. More specifically, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model was employed to encode DNA sequences, and SHapley Additive exPlanations (SHAP) analysis served as a feature selection step to look at the top-rank BERT encodings. At the last stage, different machine learning classifiers were implemented to learn the top features and produce the prediction outcomes. This study not only predicted the DNA promoters but also their activities (strong or weak promoters). Overall, several experiments showed an accuracy of 85.5 % and 76.9 % for these two levels, respectively. Our performance showed a superiority to previously published predictors on the same dataset in most measurement metrics. We named our predictor as BERT-Promoter and it is freely available at https://github.com/khanhlee/bert-promoter.


Asunto(s)
Aprendizaje Automático , Procesamiento de Lenguaje Natural , ADN , Regiones Promotoras Genéticas/genética
5.
Mol Inform ; 41(9): e2100271, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35322557

RESUMEN

In cellular transportation mechanisms, the movement of ions across the cell membrane and its proper control are important for cells, especially for life processes. Ion transporters/pumps and ion channel proteins work as border guards controlling the incessant traffic of ions across cell membranes. We revisited the study of classification of transporters and ion channels from membrane proteins with a more efficient deep learning approach. Specifically, we applied multi-window scanning filters of convolutional neural networks on almost full-length position-specific scoring matrices for extracting useful information. In this way, we were able to retain important evolutionary information of the proteins. Our experiment results show that a convolutional neural network with a minimum number of convolutional layers can be enough to extract the conserved information of proteins which leads to higher performance. Our best prediction models were obtained after examining different data imbalanced handling techniques, and different protein encoding methods. We also showed that our models were superior to traditional deep learning approaches on the same datasets as well as other machine learning classification algorithms.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Iones , Proteínas de la Membrana , Posición Específica de Matrices de Puntuación
6.
Proteins ; 90(7): 1486-1492, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35246878

RESUMEN

Protein multiple sequence alignment information has long been important features to know about functions of proteins inferred from related sequences with known functions. It is therefore one of the underlying ideas of Alpha fold 2, a breakthrough study and model for the prediction of three-dimensional structures of proteins from their primary sequence. Our study used protein multiple sequence alignment information in the form of position-specific scoring matrices as input. We also refined the use of a convolutional neural network, a well-known deep-learning architecture with impressive achievement on image and image-like data. Specifically, we revisited the study of prediction of adenosine triphosphate (ATP)-binding sites with more efficient convolutional neural networks. We applied multiple convolutional window scanning filters of a convolutional neural network on position-specific scoring matrices for as much as useful information as possible. Furthermore, only the most specific motifs are retained at each feature map output through the one-max pooling layer before going to the next layer. We assumed that this way could help us retain the most conserved motifs which are discriminative information for prediction. Our experiment results show that a convolutional neural network with not too many convolutional layers can be enough to extract the conserved information of proteins, which leads to higher performance. Our best prediction models were obtained after examining them with different hyper-parameters. Our experiment results showed that our models were superior to traditional use of convolutional neural networks on the same datasets as well as other machine-learning classification algorithms.


Asunto(s)
Adenosina Trifosfato , Proteínas Portadoras , Algoritmos , Sitios de Unión , Aprendizaje Automático , Redes Neurales de la Computación , Proteínas/química
7.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1235-1244, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-32750894

RESUMEN

Living organisms receive necessary energy substances directly from cellular respiration. The completion of electron storage and transportation requires the process of cellular respiration with the aid of electron transport chains. Therefore, the work of deciphering electron transport proteins is inevitably needed. The identification of these proteins with high performance has a prompt dependence on the choice of methods for feature extraction and machine learning algorithm. In this study, protein sequences served as natural language sentences comprising words. The nominated word embedding-based feature sets, hinged on the word embedding modulation and protein motif frequencies, were useful for feature choosing. Five word embedding types and a variety of conjoint features were examined for such feature selection. The support vector machine algorithm consequentially was employed to perform classification. The performance statistics within the 5-fold cross-validation including average accuracy, specificity, sensitivity, as well as MCC rates surpass 0.95. Such metrics in the independent test are 96.82, 97.16, 95.76 percent, and 0.9, respectively. Compared to state-of-the-art predictors, the proposed method can generate more preferable performance above all metrics indicating the effectiveness of the proposed method in determining electron transport proteins. Furthermore, this study reveals insights about the applicability of various word embeddings for understanding surveyed sequences.


Asunto(s)
Proteínas Portadoras , Biología Computacional , Biología Computacional/métodos , Transporte de Electrón , Electrones , Máquina de Vectores de Soporte
8.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34472594

RESUMEN

In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method. mCNN-ETC is a deep learning model which can transform the protein evolutionary information into image-like data composed of 20 channels, which correspond to the 20 amino acids in the protein sequence. We constructed CNN layers with different scanning windows in parallel to enhance the useful pattern detection ability of the proposed model. Then we filtered specific patterns through the 1-max pooling layer before inputting them into the prediction layer. This research attempts to solve a basic problem in biology in terms of application: predicting electron transporters and classifying their corresponding complexes. The performance result reached an accuracy of 97.41%, which was nearly 6% higher than its predecessor. We have also published a web server on http://bio219.bioinfo.yzu.edu.tw, which can be used for research purposes free of charge.


Asunto(s)
Electrones , Redes Neurales de la Computación , Secuencia de Aminoácidos , Evolución Biológica , Humanos , Proteínas/química
9.
Methods ; 204: 199-206, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-34915158

RESUMEN

As one of the most common post-transcriptional epigenetic modifications, N6-methyladenine (6 mA), plays an essential role in various cellular processes and disease pathogenesis. Therefore, accurately identifying 6 mA modifications is necessary for a deep understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models were developed with small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we present a novel model based on transformer architecture and deep learning to identify DNA 6 mA sites from the cross-species genome. The model is constructed on a benchmark dataset and explored a feature derived from pre-trained transformer word embedding approaches. Subsequently, a convolutional neural network was employed to learn the generated features and generate the prediction outcomes. As a result, our predictor achieved excellent performance during independent test with the accuracy and Matthews correlation coefficient (MCC) of 79.3% and 0.58, respectively. Overall, its performance achieved better accuracy than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, our model is expected to assist biologists in accurately identifying 6mAs and formulate the novel testable biological hypothesis. We also release source codes and datasets freely at https://github.com/khanhlee/bert-dna for front-end users.


Asunto(s)
Genoma , Redes Neurales de la Computación , ADN/genética , Epigénesis Genética , Programas Informáticos
10.
Comput Biol Med ; 131: 104259, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33581474

RESUMEN

Recently, language representation models have drawn a lot of attention in the field of natural language processing (NLP) due to their remarkable results. Among them, BERT (Bidirectional Encoder Representations from Transformers) has proven to be a simple, yet powerful language model that has achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embeddings to capture the semantics and context in which words appear. We utilized pre-trained BERT models to extract features from protein sequences for discriminating three families of glucose transporters: the major facilitator superfamily of glucose transporters (GLUTs), the sodium-glucose linked transporters (SGLTs), and the sugars will eventually be exported transporters (SWEETs). We treated protein sequences as sentences and transformed them into fixed-length meaningful vectors where a 768- or 1024-dimensional vector represents each amino acid. We observed that BERT-Base and BERT-Large models improved the performance by more than 4% in terms of average sensitivity and Matthews correlation coefficient (MCC), indicating the efficiency of this approach. We also developed a bidirectional transformer-based protein model (TransportersBERT) for comparison with existing pre-trained BERT models.


Asunto(s)
Proteínas Facilitadoras del Transporte de la Glucosa , Procesamiento de Lenguaje Natural , Glucosa , Lenguaje , Semántica
11.
Comput Biol Med ; 131: 104258, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33601085

RESUMEN

The electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. Identifying Flavin Adenine Dinucleotide (FAD) binding sites in the electron transport chain is vital since it helps biological researchers precisely understand how electrons are produced and are transported in cells. This study distills and analyzes the contextualized word embedding from pre-trained BERT models to explore similarities in natural language and protein sequences. Thereby, we propose a new approach based on Pre-training of Bidirectional Encoder Representations from Transformers (BERT), Position-specific Scoring Matrix profiles (PSSM), Amino Acid Index database (AAIndex) to predict FAD-binding sites from the transport proteins which are found in nature recently. Our proposed approach archives 85.14% accuracy and improves accuracy by 11%, with Matthew's correlation coefficient of 0.39 compared to the previous method on the same independent set. We also deploy a web server that identifies FAD-binding sites in electron transporters available for academics at http://140.138.155.216/fadbert/.


Asunto(s)
Aminoácidos , Flavina-Adenina Dinucleótido , Secuencia de Aminoácidos , Sitios de Unión , Suministros de Energía Eléctrica , Flavina-Adenina Dinucleótido/metabolismo
12.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33539511

RESUMEN

Recently, language representation models have drawn a lot of attention in the natural language processing field due to their remarkable results. Among them, bidirectional encoder representations from transformers (BERT) has proven to be a simple, yet powerful language model that achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embedding to capture the semantics and context of the words in which they appeared. In this study, we present a novel technique by incorporating BERT-based multilingual model in bioinformatics to represent the information of DNA sequences. We treated DNA sequences as natural sentences and then used BERT models to transform them into fixed-length numerical matrices. As a case study, we applied our method to DNA enhancer prediction, which is a well-known and challenging problem in this field. We then observed that our BERT-based features improved more than 5-10% in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient compared to the current state-of-the-art features in bioinformatics. Moreover, advanced experiments show that deep learning (as represented by 2D convolutional neural networks; CNN) holds potential in learning BERT features better than other traditional machine learning techniques. In conclusion, we suggest that BERT and 2D CNNs could open a new avenue in biological modeling using sequence information.


Asunto(s)
Biología Computacional/métodos , ADN/genética , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Modelos Biológicos , Procesamiento de Lenguaje Natural , Simulación por Computador , Exactitud de los Datos , Humanos , Multilingüismo , Semántica , Sensibilidad y Especificidad , Transcripción Genética
13.
BMC Med Genomics ; 13(Suppl 10): 155, 2020 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-33087125

RESUMEN

BACKGROUND: Cytokines are a class of small proteins that act as chemical messengers and play a significant role in essential cellular processes including immunity regulation, hematopoiesis, and inflammation. As one important family of cytokines, tumor necrosis factors have association with the regulation of a various biological processes such as proliferation and differentiation of cells, apoptosis, lipid metabolism, and coagulation. The implication of these cytokines can also be seen in various diseases such as insulin resistance, autoimmune diseases, and cancer. Considering the interdependence between this kind of cytokine and others, classifying tumor necrosis factors from other cytokines is a challenge for biological scientists. METHODS: In this research, we employed a word embedding technique to create hybrid features which was proved to efficiently identify tumor necrosis factors given cytokine sequences. We segmented each protein sequence into protein words and created corresponding word embedding for each word. Then, word embedding-based vector for each sequence was created and input into machine learning classification models. When extracting feature sets, we not only diversified segmentation sizes of protein sequence but also conducted different combinations among split grams to find the best features which generated the optimal prediction. Furthermore, our methodology follows a well-defined procedure to build a reliable classification tool. RESULTS: With our proposed hybrid features, prediction models obtain more promising performance compared to seven prominent sequenced-based feature kinds. Results from 10 independent runs on the surveyed dataset show that on an average, our optimal models obtain an area under the curve of 0.984 and 0.998 on 5-fold cross-validation and independent test, respectively. CONCLUSIONS: These results show that biologists can use our model to identify tumor necrosis factors from other cytokines efficiently. Moreover, this study proves that natural language processing techniques can be applied reasonably to help biologists solve bioinformatics problems efficiently.


Asunto(s)
Biología Computacional , Aprendizaje Automático , Factores de Necrosis Tumoral/metabolismo , Secuencia de Aminoácidos , Humanos , Procesamiento de Lenguaje Natural , Factores de Necrosis Tumoral/química
14.
Mol Inform ; 39(10): e2000033, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32598045

RESUMEN

We herein proposed a novel approach based on the language representation learning method to categorize electron complex proteins into 5 types. The idea is stemmed from the the shared characteristics of human language and protein sequence language, thus advanced natural language processing techniques were used for extracting useful features. Specifically, we employed transfer learning and word embedding techniques to analyze electron complex sequences and create efficient feature sets before using a support vector machine algorithm to classify them. During the 5-fold cross-validation processes, seven types of sequence-based features were analyzed to find the optimal features. On an average, our final classification models achieved the accuracy, specificity, sensitivity, and MCC of 96 %, 96.1 %, 95.3 %, and 0.86, respectively on cross-validation data. For the independent test data, those corresponding performance scores are 95.3 %, 92.6 %, 94 %, and 0.87. We concluded that using feature extracted using these representation learning methods, the prediction performance of simple machine learning algorithm is on par with existing deep neural network method on the task of categorizing electron complexes while enjoying a much faster way for feature generation. Furthermore, the results also showed that the combination of features learned from the representation learning methods and sequence motif counts helps yield better performance.


Asunto(s)
Biología Computacional/métodos , Complejos Multiproteicos/clasificación , Complejos Multiproteicos/metabolismo , Secuencia de Aminoácidos , Transporte de Electrón , Humanos , Procesamiento de Lenguaje Natural , Máquina de Vectores de Soporte , Procesamiento de Texto
15.
Anal Biochem ; 577: 73-81, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31022378

RESUMEN

Membrane transport proteins and their substrate specificities play crucial roles in various cellular functions. Identifying the substrate specificities of membrane transport proteins is closely related to protein-target interaction prediction, drug design, membrane recruitment, and dysregulation analysis, thus being an important problem for bioinformatics researchers. In this study, we applied word embedding approach, the main cause for natural language processing breakout in recent years, to protein sequences of transporters. We defined each protein sequence based on the word embeddings and frequencies of its biological words. The protein features were then fed into machine learning models for prediction. We also varied the lengths of protein sequence's constituent biological words to find the optimal length which generated the most discriminative feature set. Compared to four other feature types created from protein sequences, our proposed features can help prediction models yield superior performance. Our best models reach an average area under the curve of 0.96 and 0.99, respectively on the 5-fold cross validation and the independent test. With this result, our study can help biologists identify transporters based on substrate specificities as well as provides a basis for further research that enriches a field of applying natural language processing techniques in bioinformatics.


Asunto(s)
Biología Computacional/métodos , Proteínas de Transporte de Membrana/química , Secuencia de Aminoácidos , Humanos , Procesamiento de Lenguaje Natural , Especificidad por Sustrato , Máquina de Vectores de Soporte
16.
J Bioinform Comput Biol ; 17(1): 1950005, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30866734

RESUMEN

Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict GTP binding sites in Rab proteins, which is one of the most vital molecular functions in life science. A functional loss of GTP binding sites in Rab proteins has been implicated in a variety of human diseases (choroideremia, intellectual disability, cancer, Parkinson's disease). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases and designing the drug targets. Our deep learning model with two-dimensional convolutional neural network and position-specific scoring matrix profiles could identify GTP binding residues with achieved sensitivity of 92.3%, specificity of 99.8%, accuracy of 99.5%, and MCC of 0.92 for independent dataset. Compared with other published works, this approach achieved a significant improvement. Throughout the proposed study, we provide an effective model for predicting GTP binding sites in Rab proteins and a basis for further research that can apply deep learning in bioinformatics, especially in nucleotide binding site prediction.


Asunto(s)
Guanosina Trifosfato/metabolismo , Redes Neurales de la Computación , Proteínas de Unión al GTP rab/química , Proteínas de Unión al GTP rab/metabolismo , Secuencia de Aminoácidos , Aminoácidos/análisis , Sitios de Unión , Biología Computacional/métodos , Bases de Datos de Proteínas/estadística & datos numéricos , Aprendizaje Profundo , Humanos , Proteínas de Unión al GTP rab/genética
17.
Anal Biochem ; 571: 53-61, 2019 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-30822398

RESUMEN

An enhancer is a short (50-1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/.


Asunto(s)
Biología Computacional , ADN/genética , Elementos de Facilitación Genéticos/genética , Máquina de Vectores de Soporte , Humanos , Análisis de Secuencia de ADN
18.
Anal Biochem ; 555: 33-41, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29908156

RESUMEN

Deep learning has been increasingly used to solve a number of problems with state-of-the-art performance in a wide variety of fields. In biology, deep learning can be applied to reduce feature extraction time and achieve high levels of performance. In our present work, we apply deep learning via two-dimensional convolutional neural networks and position-specific scoring matrices to classify Rab protein molecules, which are main regulators in membrane trafficking for transferring proteins and other macromolecules throughout the cell. The functional loss of specific Rab molecular functions has been implicated in a variety of human diseases, e.g., choroideremia, intellectual disabilities, cancer. Therefore, creating a precise model for classifying Rabs is crucial in helping biologists understand the molecular functions of Rabs and design drug targets according to such specific human disease information. We constructed a robust deep neural network for classifying Rabs that achieved an accuracy of 99%, 99.5%, 96.3%, and 97.6% for each of four specific molecular functions. Our approach demonstrates superior performance to traditional artificial neural networks. Therefore, from our proposed study, we provide both an effective tool for classifying Rab proteins and a basis for further research that can improve the performance of biological modeling using deep neural networks.


Asunto(s)
Membrana Celular/metabolismo , Coroideremia/metabolismo , Discapacidad Intelectual/metabolismo , Aprendizaje Automático , Modelos Biológicos , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Redes Neurales de la Computación , Proteínas de Unión al GTP rab/metabolismo , Humanos , Transporte de Proteínas
19.
J Comput Chem ; 38(23): 2000-2006, 2017 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-28643394

RESUMEN

In several years, deep learning is a modern machine learning technique using in a variety of fields with state-of-the-art performance. Therefore, utilization of deep learning to enhance performance is also an important solution for current bioinformatics field. In this study, we try to use deep learning via convolutional neural networks and position specific scoring matrices to identify electron transport proteins, which is an important molecular function in transmembrane proteins. Our deep learning method can approach a precise model for identifying of electron transport proteins with achieved sensitivity of 80.3%, specificity of 94.4%, and accuracy of 92.3%, with MCC of 0.71 for independent dataset. The proposed technique can serve as a powerful tool for identifying electron transport proteins and can help biologists understand the function of the electron transport proteins. Moreover, this study provides a basis for further research that can enrich a field of applying deep learning in bioinformatics. © 2017 Wiley Periodicals, Inc.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...